Obj2Text: Generating Visually Descriptive Language from Object Layouts

نویسندگان

Xuwang Yin

Vicente Ordonez

چکیده

Generating captions for images is a task that has recently received considerable attention. In this work we focus on caption generation for abstract scenes, or object layouts where the only information provided is a set of objects and their locations. We propose OBJ2TEXT, a sequence-tosequence model that encodes a set of objects and their locations as an input sequence using an LSTM network, and decodes this representation using an LSTM language model. We show that our model, despite encoding object layouts as a sequence, can represent spatial relationships between objects, and generate descriptions that are globally coherent and semantically relevant. We test our approach in a task of object-layout captioning by using only object annotations as inputs. We additionally show that our model, combined with a state-of-the-art object detector, improves an image captioning model from 0.863 to 0.950 (CIDEr score) in the test benchmark of the standard MS-COCO Captioning task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Baby Talk: Understanding and Generating Image Descriptions

We posit that visually descriptive language offers computer vision researchers both information about the world, and information about how people describe the world. The potential benefit from this source is made more significant due to the enormous amount of language data easily available today. We present a system to automatically generate natural language descriptions from images that exploi...

متن کامل

Synthesis of Indexing Expressions for Complex Data Layouts

We present a technique for generating and optimizing expressions for indexing complex array layouts. Our technique is built around a declarative, domain-specific layout language that provides support for arbitrarily-nested row-major, column-major, Z-Morton, and Hilbert curve layouts. To maintain programmability, we maintain a ‘logical,’ two-dimensional view of the data in physical memory, allow...

متن کامل

UML Modeling for Visually-Impaired Persons

Software modeling is generally a collaborative activity and typically involves graphical diagrams. The Unified Modeling Language (UML) is the de facto standard for modeling object-oriented software. It provides notations for modeling a system’s structural information (e.g. databases, sensors, controllers, etc.), and behavior, depicting the functionality of the software. Because UML relies heavi...

متن کامل

AVDT - Automatic Visualization of Descriptive Texts

Expressing mental images visually as 3D scenes is a time-consuming challenge. Therefore, we employ natural language to facilitate the creation of virtual environments. In this paper, we present a framework, which automatically converts an arbitrary descriptive text into a representative 3D scene. Our system parses a user-written input text, extracts information using techniques from Natural Lan...

متن کامل

A Graphical Data Model For CASE

~omputer-aided software engineering (CASE) applications mvolve several special data modeling requirements, not the least of which is the need to store graphical representations of complex descriptive data. This paper describes the EARNG data model, which is designed to serve as the basis for storage of such data in the context of CASE support tools. The model enables integrated storage of graph...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Obj2Text: Generating Visually Descriptive Language from Object Layouts

نویسندگان

چکیده

منابع مشابه

Baby Talk: Understanding and Generating Image Descriptions

Synthesis of Indexing Expressions for Complex Data Layouts

UML Modeling for Visually-Impaired Persons

AVDT - Automatic Visualization of Descriptive Texts

A Graphical Data Model For CASE

عنوان ژورنال:

اشتراک گذاری